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Abstract Assessing variability according to distinct factors in data is a funda- 
mental technique of statistics. The method commonly regarded to as analysis of 
variance (ANOVA) is, however, typically confined to the case where all levels of 
a factor are present in the data (i.e. the population of factor levels has been ex- 
hausted). Random and mixed effects models are used for more elaborate cases, but 
require distinct nomenclature, concepts and theory, as well as distinct inferential 
procedures. Following a hierarchical Bayesian approach, a comprehensive ANOVA 
framework is shown, which unifies the above statistical models, emphasizes practi- 
cal rather than statistical significance, addresses issues of parameter identifiability 
for random effects, and provides straightforward computational procedures for in- 
ferential steps. Although this is done in a rigorous manner the contents herein can 
be seen as ideological in supporting a shift in the approach taken towards analysis 
of variance. 

Keywords: ANOVA; fixed effects; random effects; variance components; hierar- 
chical Bayes; multilevel model; constraints 



1 Introduction 



As an independent field of study Statistics is rather young. Many of the methods, tech- 
niques, and philosophies can be attributed to a handful of statisticians during the first 



half of the twentieth century. Among these, R.A. Fisher is often recognized as having had 
a profound influence on the field. It has been said that Fisher single-handedly created 



the foundations of modern statistical science (Hald, 1998). For statisticians the first con 



tribution that comes to mind is his work in development of likelihood theory. However, 
for the greater scientific community, one might consider his formulation of analysis of 
variance as the most significant contribution. 



As much of Fisher's work was in agriculture, an apt example to consider is the one-way 
ANOVA, Yij = fi + ai + eij, in which observations are on crop yield, with i,j representing 
the j*^ plant receiving fertilizer treatment i. An appropriate decomposition of the data 
should then reveal the variability due to different fertilizers while accounting for variabil- 
ity within plant groups that receive the same type of fertilizer treatment. Thus, analysis 
of variance is essentially a pragmatic decomposition of the data. In correspondence Fisher 



has been cited (Searle et al, 1992) to have said 



"The analysis of variance is (not a mathematical theorem but) a simple 
method of arranging arithmetical facts so as to isolate and display the es- 
sential features of a body of data with the utmost simplicity." 

The elegance and power of this methodology is perhaps what has caused ANOVA to 
become so popular in nearly all areas of scientific research. However, along with the 
ubiquitous support of the methodology has come a pervasive reliance on its conclusory 
result, the p-value. Recognition of this problem is not new. It has been long noted 
by researchers in other fields that the hypothesis-based point of view, which relies on 



statistical significance, should be amended. (Yoccoz, 1991; Fidler et al, 2004 loanni 



dis, 2005). The statistical community has also long acknowledged the need to provide 



methodologies that are first and foremost, "of use to scientists in making quantitative 



inferences," (Nelder 1999). The problem is that the standard methods that continue to 



be imparted on students focus on statistical significance. As stated by Savage (1957), a 



method that does so "simply reflects the size of the sample and the power of the test, and 
is not a contribution to science." Thus, any standard, or default methodology that aims 
to decompose variation present in a set of observations according to factors of interest. 



should be able to address practical significance as well. 

In addition to the base objective of analysis of variance, to decompose variation in 
observations according to distinct sources of variability, a default method used in ini- 
tial/exploratory work should accomplish the following. 

• Allow for each factor to simultaneously consider variability due to the observed set 
of effects (finite population variance), as well as the variability from unobserved 
effects (superpopulation variance), thereby permitting greater flexibility in model 
choice with regards to flxed or random effects. 

• Facilitate comparison of magnitude of variability across all factors in the model, 
including errors, so that attention may be given to practical signiflcance as well as 
statistical significance of a factor. 

• Provide ability to consider both magnitude and uncertainty of variance parameters 
in the model, by providing confidence, or uncertainty intervals in a default analysis 
summary. 

These are precisely the goals of the analysis of variance framework proposed in this paper. 
While the primary contribution may be seen as ideological in nature, there are technical 
issues that are addressed to allow for a shift in the standard approach taken towards the 
basic method of analysis of variance. By standard approach one may assume the tabular 
analysis of variance summary and its accompanying test of statistical significance. 

The organization of the paper is as follows. Section |2] covers basic concepts of stan- 
dard methods that are both widely taught and employed, as well as recent shifts in the 
practice of ANOVA. Section [3] presents an alternative framework of ANOVA along with 
modifications to the standard ANOVA table summary. Section |4] illustrates our method 
and compares it to the classical approaches. In particular, we present an example in which 
classical ANOVA yields identical p-values for two cases; one in which the factor under 
investigation has low practical significance, and one with high practical significance. 
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2 Background 

Following Fisher's analysis of variance overall uncertainty is attributed to distinct factors 
of an experiment through the use of a sum of squares decomposition. This is now shown 
with the balanced one-way analysis of variance model 

Yij = n + ai + eij, i = l,...,ni, j = l,...,nj. (1) 

As a seminal example consider observations that are on crop yield with i,j represent- 
ing the j*^ plant receiving fertilizer treatment i. More generally the indices represent 
a factor level i and replicate j. The appropriate decomposition of the data, which re- 
veals variability due to different fertilizers while accounting for variability within plant 
groups that receive the same type of fertilizer treatment, is done with the arithmetical 
arrangement that summarizes yield for each type, Fj, = Uj^ Y^j, and for overall yield, 
Y = Yij = nj^ , where n = rij ■ nj. Observations Y^j are decomposed 

with the identity 

Y,, - Y„ = (F,, - FO + (F,. - F. .). (2) 

Terms are then squared and summed, noting that the cross term on the right hand side 
equals zero, so that a decomposition of the mean-adjusted sums of squares is 



— V V ^ 

SST SSE SSA 



where Yi, is the mean within group i and F.. is the mean of all observations. The terms 
SST, SSA, and SSE denote total (adjusted) sum of squares, sum of squares among groups, 
and sum of squared errors, respectively. Note that each of these terms is itself a sum 
of squares that is analogous to a sample variance = Yli=ii^i ^ ^ set of 

independent observations xi, . . . ,Xk, and is thus proportional to a distribution with 
appropriate degrees of freedom. Fisher showed that SSA and SSE are both proportional 
to distributions, with nj — l and n — nj degrees of freedom, respectively, and that they 



are independent, the general result of which is due to Cochran (1934). 

While this classical methodology provides a means to examine statistical significance, 
it does not provide any formal assessment of practical significance. Loosely speaking. 



practical significance can be considered as a contextual basis that allows data-specific 
conclusions to be drawn, i.e. evidence that SSA is substantial compared, not only to zero, 
but to SSE as well. Practical significance in the example above implies that the variability 
due to the fertilizer treatment is not only significantly different than no treatment, but 
that when compared to plant-to-plant variability it is still significant. One contribution of 
this paper is in attempting to formalize a statistical methodology that rigorously provides 
a method of assessing practical significance. 



2.1 Conventional Methods 

A fixed effects model generally refers to the case when the observations have exhausted 
the population of factor levels (e.g. treatments), or when interest lies only with the factor 
levels that have been observed. Alternatively, random effects models are employed when 
it is assumed that the factor levels are a subset of a greater population of possible levels. 



This definition provided by Hoaglin et al. (1991, p. 195) is somewhat more explicit than 



that given by Eisenhart (1947), in which the effects of a model are considered to be fixed 
when they are all nonrandom, and considered to be random when they are all random. 
There exist many other definitions in the literature, some of which are not compatible. 



See Gelman (2005) for a summary. 



2.1.1 Fixed Effects 

Consider the model given by ([T| such that i = l,...,n/ denotes the factor level or 
treatment, and j = 1, . . . ,nj denotes replications or errors. Observations are assumed 
to be independent across replicates as well as across factor levels. Additionally, it is 
generally assumed that 

e,, N{0, a!). (4) 

Analysis of variance generally aims to test the hypothesis that there is no difference 
among the treatments, 

Hq : ai = ■ ■ ■ = = 0, (5) 
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against the alternative hypothesis that at least one treatment level differs. The test is 
a result of the sums of squares decomposition in (|3|, since ^ ~ Xn-m (under the 
null hypothesis) ~ Xn -i) where n = rij ■ rij. The expectation of these two terms 
is ^j^y^-j- J2i ^1. + '^e ^^'i 5 respectively. The test of Hq is then carried out using the F 
distributed ratio where MSA = ^ and MSE = The term MSA is central 

Xnj-1 distributed when ([s]) is true, and non-central with shift of ^^yrx^i*^? + when 
false. 



The results described are concisely displayed in a tabular format (Fisher, 1925), as 
seen in Table [l] The table culminates with ([s]) being tested based on the p-value of 
p = Pr(F„j-_i > F), which does not give any indication of the practical significance. 
And despite recognition of the need to focus on effect sizes and confidence intervals 



(Gardner and Altman, 1986 Nakagawa and Cuthill, 2007) rather than testing, the table 



remains a staple among statistical methodologies. 

Table 1: One-way analysis of variance. 



Source 


Df 


Sum Sq 


Mean Sq F value 


Pr(>F) 


Factor A 
Errors 


rij — 1 
n — rij 


SSA 
SSE 


MSA F=M|| 
MSE 


Pr(-^nj-l,n-n/ > F) 



2.1.2 Random Effects 



In addition to the statistical model ([!]) and distributional assumption (|4]), there is an 
additional assumption on the factor levels. 



ai ~ iV(0, cr^), 2 = 1, . . .n/. 



(6) 



with ctj and eij independent. Observations are then normally distributed with mean and 
variance 



CoY{Yij,Yi,ji) = < 



This parameterization has the added benefit that the parameter space for the factor levels 
is reduced from rij to 1, since only o"^ is estimated. Although individual levels, a^, may 
be predicted if necessary. Averaging over replications at factor level i yields the mean Yi., 

2 

which are independently distributed N{jj, a^J, where = + Thus, the likelihood 
is a function of the three parameters /i, cr^, and cr^. 

Analogous to (|5]), the initial inquiry of interest is generally concerned with whether 
greater population variance cr^ is significantly different from zero. This corresponds to 
the null hypothesis 



Ho:ai = 0, 



(7) 



and is tested using the same F-statistic as for ^ (Searle et al, 1992 Rao, 1997 Cox and 



Solomon, 2003). Aside from its unintuitive nature, in that despite being random vs. fixed 



the same test statistic is used, this hypothesis test does little to remark on the practical 
significance of the variation due to factor a. Namely, the hypothesis may be rejected 
even when variation due to the errors is substantially greater, as seen in the example of 
Section Hil 

Further inferential procedures on the variance components themselves are typically 
based on method of moments estimators, or explicitly use the likelihood. In the latter 
case, variability of the variance components are estimated with the Hessian of the likeli- 



hood, as with the widely used R packages nlme (Pinheiro et al., 2006) and lme4 (Bates 



and DebRoy, 2004). Wald-type confidence intervals can be then used to obtain confidence 



regions for the parameters. Similarly, the asymptotic properties of the log likelihood can 
be utilized to obtain confidence intervals using the distribution, as seen in Figure [2] of 
Section SH 



2.1.3 Issues and Concerns 

The choice to use a fixed or random effects model is not always immediately clear. The 
terminology alone may be seen as ambiguous since the distinction between fixed effects, 
random effects, and mixed effects is somewhat malleable. The simple fixed effects model 



of Section 2.1.1 can be seen as having a random component in the errors, ej,-. Similarly, 



the random effects model of Section 2.1.2 can be seen as having a fixed component, /i. In 



both cases implying a mixed effects model. In practice a mixed effects model is employed 
when there are two or more factors, other than overall mean and errors, and they are not 
all fixed (random). 

More difficult perhaps is determining when which of these methods should be used. If 
interest lies in the distribution of the random effects, i.e. the variance component cr^, then 
a random effects model should be chosen. If interest lies in the realized/observed levels 
of the factor, then a fixed effects model is used. If both are of interest, then the random 



effects should be chosen and levels are then predicted, rather than estimated. Searle 



et al. (1992, pl8) take a pragmatic approach to this by recommending that in any case 
in which it is reasonable to assume that the levels of the factor come from a probability 
distribution, i.e. that ^ may be assumed, then a random effects model should be chosen. 
The usage of a random effects model, however, typically precludes the estimation of the 
finite population variance. 

An additional problem that arises in analysis of variance with several factors is the 



so-called 'mixed models controversy' (Voss, 1999 Lencina et al., 2005; Nelder, 2008). The 



problem essentially comes down to how a hypothesis test of a random effect is carried 
out when an interaction is also present in the model. 



To resolve the issues above we support the notion of Gelman (2005 ), in that all factors 
in the model are treated as random. The procedural steps are then carried out equiv- 
alently. If interest is in the observed (unobserved) levels of a factor, then the greater 
focus is given to the finite (super) population variance. However, because of parame- 
ter dependencies involved in the unconstrained factor levels, Gelman recommends using 
MCMC, in which redundant parameterization is used in order to reduce dependencies 
and to speed up posterior sampling. Alternatively, we recommend using constraints to 
define an improper joint prior distribution on the factor levels, thereby eliminating the 



need for complex MCMC procedures, as in Geinitz et al. (2012). 



2.2 Multilevel Models 

Often times the results of an analysis should allow for simultaneous consideration of 
both group level and individual level variability, e.g. variability according to schools and 
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to students within schools. Apphcations of such scenarios initially arose in the social 



sciences (Goldstein 1995 Kreft et al, 1998 Snijders and Bosker 2011), but have also 



included the health sciences (Von Korff et al. , 1992 Greenland, 2000), and have provided 
the basis for much of the work in multilevel models. 

A multilevel model can be seen as a linear model with coefficients, i.e. factor levels. 



that are themselves modeled (Gelman and Hill, 2006). More generally, this can be con- 
sidered as a type of hierarchical Bayesian approach. However, while not explicit, the 
multilevel point of view is useful in considering a generalized approach towards analysis 
of variance. Because the simultaneous consideration of group and individual level vari- 
ability entails the decomposition of variation according to each of these sources, "ANOVA 



is fundamentally about multilevel modeling" (Gelman, 2005). That is to say, analysis of 
variance from the viewpoint of multilevel models allows for both finite population and 
superpopulation variance components to be considered, which can be seen as a unification 
of fixed and random effects. This comprehensive approach to analysis of variance yields 



useful results and has been used in other fields such as ecology (Qian and Shen, 2007), 



genetics (Leinonen et al, 2008), and climate (Sain et al. 2011). 

In practice there have been some hindrances in the adoption of this more general 
approach to ANOVA. Computational procedures to carry out such an analysis typically 
rely on either mixed effects models (e.g. Ime4 package in R) or on MCMC methods (e.g. 
WinBUGS). However, while mixed effects models can be used to obtain initial estimates 
of the parameters in a multilevel model, inferential steps, e.g. confidence intervals, for 
variance parameters are often done through likelihood approximation. For more explicit 



inferential procedures it is necessary to use MCMC methods (Gelman and Hill 2006 



p. 566). Although the added complexity and computation of MCMC, particularly when 
the use is as an exploratory analysis step, can be a deterrent to this approach. A method 
that is both precise in its inferential statements while being straightforward to implement 
is not widely known. 
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2.3 Bayesian Results 

The hierarchical approach towards analysis of variance can be explained most readily 
in a Bayesian framework. In an effort to explain this approach in a classical inference 



framework Gelman (2005) recommends a simulation approach, which is reminiscent of 



posterior sampling. Because we prefer to adopt an explicit Bayesian approach, we now 
review some results on distributions for variance components that facilitate the procedure. 



2.3.1 Prior Distributions 

For the normally distributed random variable Y ~ N(/i, cr^), prior specification of the 
parameters can be done in many different ways. Initially, consider fi, to be either 



known or unknown, each in turn. Following the invariance principle (Jeffreys, 1946), 
prior distributions in univariate cases are then 

/i known, unknown: p{fi) oc const, 
/i unknown, o"^ known: oc (o"^)"^. 



Box and Tiao (1992, p. 43) derive same priors using the concept of location and scale pa- 



rameters. These identical priors are also found using the reference approach of Bernardo 



and Smith (2000, p314), due to asymptotic normality of the posterior distributions. 
Note that the density of p(cr^) = (o"^)~^ corresponds to an inverse-gamma distribution, 
r~^{u,v), with u = V = 0. Common values of hyperparameters have thus been limiting 
forms thereof, such as u = v = e, with e small (Lunn et a/., 2000). If prior independence 



between /i and is assumed, then the two univariate priors are combined for 



2\-l 



p{fx,a^) =piMa^) oc (or 

Alternatively, Jeffreys' prior for multivariate parameters = (yU,(T^)"^ without indepen- 
dence leads to 



p(/i,a2)oc(a^)-3/^ 



(9) 



These correspond to cr^ ~ F ^{u,v), with u 



for the prior given by ([8|, and 



u 



^, w = for the prior given by (|9|. 
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Box and Tiao (1992, p. 251) decompose the likelihood by group means, e.g. Yi in 
(|2|, to place a prior directly on o"^^. The joint prior distribution for /i, cr^, o"^^ is then 



2,2 ^-l 



(10) 



Additionally, Jeffreys' independence prior of the original variance parameters {a'^,al 



also leads to (10) (Box and Tiao, 1992). The multivariate analog of this has been used 



as well by Everson and Morris (2000). Naturally, a prior may also be placed directly on 



the parameter o"^, although the posterior may no longer be as simple to work with. 



2.3.2 Conjugacy 

For observations, Yi ~ N(/i, cr^ 



1, . . . ,n, a multivariate conjugate prior for the pa- 



rameter 6 = {fi, a'^Y is a normal-inverse-gamma distribution, denoted as NF ^(/xq, r, u, v) 



with yUo E 7l,T,u,v > 0. More specifically. 



/i I (T^ ~ N(/io, 



cr 

r 



F-Vn, v) 



(11) 
(12) 



with joint density. 



p(/i,a2) =p(^|(j2) .p(c^2^) 

,2 



(27r 



r 



.-1/2 



exp 



■2^^^"^°)'; F(«) 



>2)-«-iexp 



(13) 



Priors corresponding to ([s]) and ^ are then denoted by NF~^(0, 0, —1, 0) and by NF~^(0, 0, 0, 
respectively. Conjugate priors of this form have been used extensively, although often 
with precision, r = (cr^)~^, resulting in a normal-gamma distribution (Bernardo and 



Smith, 2000[ p. 136). The utility of this general parameterization is in being able to con- 



form to different prior specifications while maintaining conjugacy. The full model with 
likelihood and prior is posterior is 



(14) 
(15) 
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with posterior distribution given by 



r~Nr ,T + n,u+-,v + - 

T + n 2 2 



3 Comprehensive ANOVA 



(16) 



Following the view of Gelman (2005) we see the hierarchical Bayesian approach towards 



ANOVA (Section 2.2) as a means to unify the two distinct fixed and random effects 



models. In addition to the hierarchical model structure a Bayesian model specification 
is intuitive and practical. By following this approach the challenges discussed in Sec- 
tion [2rL3] are resolved. 

Hierarchical Bayesian models are typically considered simply as mixed effects models 
within the statistical community. However, because mixed effects models do not typically 
provide assessments of uncertainty of the variance component estimates, nor is variability 
of the observed set of factor levels examined by default, we do not see this as truly 
providing a comprehensive approach towards ANOVA. As stated, multilevel modeling 
seems to be a more natural strategy. As a result much of the work with multilevel models, 
including analysis of variance according to various factors, has largely taken place in other 



domains, primarily the social sciences (Goldstein 1995 Gelman and Hill 2006 Snijders 



and Bosker, 2011). This can be seen as a failure of the statisticians, as Huber (2011) 



states, "the consequences of not being able to adequately summarize and disseminate 
common methodologies may be a divergence of statistics, that each field develops its own 
version of statistics." By presenting ANOVA in a more general hierarchical framework 
we are also, "unifying the philosophies, concepts, statistical methods, and computational 



tools" (Lindsay et al. 2004). 



The unification of fixed and random effect models is clearly seen in the graphical model 
of Figure [T| The successive layers of distributional assumptions is shown clearly here. The 
inner-most box represents the fixed effects model, while the middle box represents the 
random effects model. The hierarchical Bayesian ANOVA model, or simply comprehensive 
ANOVA, is then represented by the outer- most box. The diagram explicitly shows the 
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ANOVAg 



ANOVA2 



0-^ 



ANOVAi 



0' 




a. 




Y,, 



Figure 1: Graphical model representing successive assumptions for the fixed effect 
(inner box), random effect (middle box), and fully Bayesian (outer box) specifica- 
tions. 



unification of the models and immediately conveys a general view of ANOVA to students 



and researchers not familiar with variance analyses. The notation created by Eisenhart 



(1947) is here, where ANOVAi corresponds to Mi, the fixed effects model; and M2 to 



ANOVA2, the random effects model. Instead of M3, which refers to a mixed effects model, 
we have chosen to allow ANOVA3 to refer to a fully Bayesian parameterization. This 



can be confusing though, as Cox and Solomon (2003) have pointed out, "Occasionally 



the word Bayesian is used for any analysis involving more than one level of random 
variation." We agree with them, in that this can seem quite confusing, but nonetheless 
consider ANOVA3 as a Bayesian analysis of variance procedure. 

Analysis of variance in this framework allows the questions discussed in the Intro- 



duction to be addressed, and also resolves many of the issues discussed in Section 2.1.3 



Gelman (2005) presents graphical summaries of this ANOVA approach that allow for vi- 



sual comparison of confidence intervals for the variance components, which is possible for 
both finite and superpopulation variance parameters. In Table [2] a proposed alternative 
to the traditional ANOVA table is shown. Commonly significance in the classical ANOVA 
table is merely a function of power. That is, given enough observations, nearly any ef- 
fect will be deemed as statistically significant. Alternatively, Table [2] provides estimates 
of the variance parameters, both finite and superpopulation, as well as a probabilistic 
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assessment of practical significance. This is done with a direct comparison of posterior 
distributions of all variance components against the error variance a^. A probability re- 
garding hypothesis ([T]), i.e. that the superpopulation variance is equal to zero, can be 
given as well. This probability, Pr(cr^ = 0|y), thus provides an assessment of statistical 
significance. 

3.1 One-way Model 

In the case of a single source of variation, as in model ([T]), the model can be stated as 
Yij = Ui + tij. To illustrate the basic point of view of our ANOVA approach we first focus 
on this problem. 

3.1.1 Model Specification 

A particularly useful parameterization in the one-way configuration, that allows different 
variance parameterizations while maintaining conjugacy, is an extension of the normal- 
inverse-gamma distribution. This involves an additional inverse-gamma distribution for 
the added variance component. The resulting distribution is given by 

ai|ao,ro,r„cr^,cr,^~N ao, "I + -| |> (17) 

^Lke -r-^K, v^), (18) 
(T2~r-i(u„ V,), (19) 

Table 2: Comprehensive ANOVA summary utilizing posterior distributions to ob- 
tain a summary of variance parameters (in units of standard deviation). Quantiles 
are used to provide a type of confidence interval. The probability Pr (cTq, > CelY) 
provides an assessment on practical significance for the parameter. 



Parameter 


Mean 


Median 


Uncertainty Interval 


Sig. Rel. to Errors 


a (finite) Sa 


nsa\Y] 


Qo4Sa\Y] 


{Qom5[Sa\Y], Qo.975[Sa\Y]) 


Pr(s, > a,\Y) 


(super) (To 




Qo.5K\Y] 


iQombWalY], Qo.975[(^a\Y]) 


Pr((T, > a,\Y) 




E[a,\Y] 


Q0.5We\Y] 


iQom^WelY], Qo.975We\Y]) 
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with i = 1, ... ,72/ corresponding to the number of groups, and additional parameter 
such that (Tq^ = + i^e<^e- Noting that is analogous to the variance of mean of the 



observations at an individual factor level, as in Section 2.1.2 



The variance parameters and factor levels can then be jointly specified as a combina- 
tion of normal and inverse gamma distributions, i.e. Nr~^r~^(ao, ^e, i^t, Ua, Va,Ue,Ve), 
with certain values of hyperparameters, or limits thereof, yielding different prior specifi- 
cations such as those discussed earlier. 

This is in general, however, not a conjugate model specification, i.e. the posterior dis- 
tribution will not be of the same family as the prior distribution. Although the posterior 
cr^ continues to follow an inverse-gamma distribution, with density 



pia', I Y) oc {al 



exp 



1 



at 



the posterior distribution of cr^ + K^a"^ will not be the same type as its prior for arbitrary 
values of hyperparameters r^, k^. More specifically, the posterior density of is 



p{al\Y,al)^{al + K,al) 
X exp 



-Ua-lf 2 _|_ 



-ni/2 



^ae raj « / 



KeCr^ 2 , 

where = (% + ^)~^, which can be seen as a type of shifted inverse-gamma distribu- 
tion. However, rather than normalizing the posterior density so that it is proper when 
constrained to non-negative values, it is often more informative to consider a mass point 
at zero; allowing for the hypothesis ^ to be tested (see Section [i]). 

Each individual at is however normally distributed, with posterior density 



I (^1,(^1) oc Q^^exp 



Qa 



Oii 



t nj ^ 



where Qa 



1 I nj 
-2 "T „2 



[a _|_ Te+raj 
t2 "T „2 ■ 



3.1.2 Conjugate Prior 



Setting = 0, Kg = for the prior and likelihood 



(ai,al,a'^) ~ NT ^(oq, ^q, n = 0, 



Ta 



nj 



1 '^ai 'Vai f^J, 



Yij I ai,al ~ N(ai, cr, 



(20) 
(21) 
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gives way to the posterior distribution 



1t^-1 



-\ -1 



nj 2 



n — nj 



ij / 



(22) 



thus maintaining conjugacy. Particularly beneficial is that with this factorization there 
is no need for MCMC sampling. Rather, posterior draws can be taken immediately 
without burn-in nor thinning. A single realization from the joint posterior is found by 



sampling from p(cr^|F), ]9((T^|F, o"^), and then o"^, cr^) using (19), (18), (17) with 

the parameters updated using the observations. 

Selecting appropriate values hyperparameters can then be done as follows. For in- 
variant priors of variance parameters = = = fe = is used. To maintain 
conjugacy = 0, = ^ is used. Practical values of Tq, and ao may then be found using 
an empirical Bayes approach and yield = 1 and ao = ^^7 ^ Xli i-^- the overall mean. 

For more general models, with a mean term, additional factors, interactions, etc., it 
is possible to consider several such normal-inverse-gamma-inverse-gamma distributions, 
where the single inverse-gamma distribution of the errors, a^, is common to all. One may 
then use a prior distribution for the factor levels under a linear constraint so that the 
posterior distributions can also be factored similarly. This allows not only for conjugacy, 
but also facilitates computation in a way that even for models with many factors, samples 
from the posterior can efficiently be drawn without the need for MCMC. This is seen in 



Geinitz et al. (2012). 



4 Examples 



4.1 Rails Data 



For illustration of the various analysis variance methods consider the balanced one-way 



design for data consisting of six railway rails (Devore, 2000 Pinheiro and Bates, 2000). 



Each rail has been measured three times for the amount of time that it takes a certain 
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type of ultrasonic wave to travel the length of the rail. The objective of any initial 
analysis is most likely to investigate the (a) variation due to measurement error and (b) 
variation due to the rails themselves in terms of both statistical and practical significance. 
Additionally, one may be interested predicting travel time for a future measurement. This 
can be considered for either (c) one of these rails as well as (d) a future rail that has not 
yet been seen. For this one-way analysis we consider the simple cell-means model 

Yij = ai + eij, 2 = 1,..., 6, j = 1,2,3. (23) 
4.1.1 Conventional Methods 

For this one-way model error terms are assumed to be iid N(0,(T^), and group terms 
ai as unknown constants, so that the observations are 

F,, |«„cr2~N(«„a,2). (24) 

The model assumes that the six rails that have been observed are the only rails that are 
of interest. This is a fixed effects model, which is to say that the population of rails has 
been exhausted by the sample. 

Questions (a) and (b) can be reasonably addressed using this model, although purely 
from a statistically significant point of view. Results (see Table [3]) indicate that the 
hypothesis ([s]) should be rejected, but are not able to say anything explicitly about the 
practical significance of the rails. 



Table 3: One-way ANOVA of Railway Rails 





Df 


Sum Sq 


Mean Sq 


F value 


Pr(>F) 


Rail 


5 


9310.50 


1862.10 


115.18 


0.000000 


Residuals 


12 


194.00 


16.17 







Question (c) could be answered by looking at the standard error for the estimate Sj, 
to obtain an estimate of the expected travel time. Question (d) can, however, not be 
answered because of the assumed fixed effect. To address this question the rails must be 
considered as a random effect, i.e. assumed to come from a greater population of rails. 
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Alternatively a random effects model may be used, in which the between-rail variabil- 
ity for a large, potentially infinite (super) population of rails is of interest. The hypothesis 
to be tested is then ^ and, as discussed in Section 2.1.2, uses the same F-statistic as 



for the fixed effects model. A more informative summary is often to identify a confidence 
region of the variance components, cr^, erf, as seen in Figure |2} 

4.1.2 Comprehensive ANOVA 

Analogous to the ANOVA summary provided by Table [3} but including both finite and 
superpopulation variances. Table |4] presents a clearer view on the practical significance of 
the rails. Figure [3] similarly summarizes the analysis. From the graphical plot statistical 
significance is suggested by the fact that the intervals do not extend to cover 0. Because 
variance parameter must be nonnegative, it is preferable to assemble a mass point at 
when the posterior has negative support. This allows for the probability p(o"f = 0|y) to 
be used to test the hypothesis of ([T]), which for this dataset has probability zero. 

4.2 Simulated Data 

In the following example a comparison of practical and statistical significance is illustrated 
using both classical ANOVA as well as the more comprehensive Bayesian ANOVA. Data 




10 20 30 40 50 

Figure 2: Comparison of confidence regions for superpopulation standard de- 
viations based on approximation of relative log-likelihood (solid) and using 
the highest-posterior-density (dotted). Contours correspond to confidence levels 
0.50,0.75, and 0.95 (small to large). 
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Table 4: Bayesian ANOVA Table: Posterior distributions are used to obtain two 
estimates of the variability. Quantiles provide an assessment of the uncertainty in 
these estimates. The probability, e.g. Pr (cTq, > a^), provide a relative comparison 
of each variance parameter to the measurement variability. 

Parameter Mean Q0.5 (Q0.025, Q0.975) Pr{> a,) 

Rails (finite) s« 24.69 24.71 (22.46,26.83) 1.000 

(super) a„ 25.96 23.89 (14.55,49.20) 1.000 

Errors a, 4.27 4.10 (2.87,6.58) 




\ \ 1 \ n 

10 20 30 40 50 

Travel Time (ns) 

Figure 3: Graphical summary of posterior quantiles for variance component (super 
and finite population) parameters. 

of the form Yij = ai + eij is generated where ttj ~ N(0, cr^) and e^j ~ N(0, cr^ = 1), for 
i = 1, . . . , rij, j = 1, . . . , n J for a total of n = rij ■ rij observed values. This is done under 
two distinct cases 

Case A: = 1, rij = 6, 

Case B: = 2, nj = 2, 
and with rij = 5 for both. Using the conventional ANOVA method (Table [s]) there 
is not any discernible differences between the two datasets. Statistical significance is 
approximately equivalent because of the balance of statistical power and the difference 
in the variance components and a^. 

By assembling a mass point at zero whenever the posterior has support for negative 
values it possible to use the probability p(o"^ = 0|F) to test the hypothesis of ([T]). In- 
terestingly, this posterior probability is 0.0263 for case A and 0.0258 for case B, values 
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which are comparable to their corresponding p-values in Table [5j This is also noted by 
the end-points corresponding to the 0.025 level of uncertainty for the intervals shown in 
Figure |4} 

A more informative and comprehensive summary of the data is provided by the 
Bayesian ANOVA table (Table [6]). This provides not only estimates of the variance 
components, but also an indication of the practical significance of the factor a when 
observed with error e. 

Table 5: Classical ANOVA table to summarize the decomposition of variance. 
Case A (left), with ni = b,nj = 6,cr^ = \, represents low practical significance 
of factor a. Case B (right), with rti = 5,nj = 2,cr^ = 2, represents strong 
practical significance. Despite practical differences between the two cases, p-values 
are nearly equal. 



Df 


Sum Sq 


Mean Sq 


F value 


Pr(>F) 


Df 


Sum Sq 


Mean Sq 


F value 


Pr(>F) 


a 5 


9.70 


1.94 


3.08 


0.0267 


a 5 


27.69 


5.54 


7.31 


0.0239 


€ 25 


15.75 


0.63 






€ 5 


3.79 


0.76 







Table 6: Bayesian ANOVA tables to summarize the variance decomposition. Case 
A (left) and Case B (right) illustrate a situation in which factor a has low, or high 
practical significance. 



Farm 


Mean 


Qo.5 


(Qo.025: Q0.975) 


Pr(> <T,) 


Farm 


Mean 


Qo.5 


(Qo.025: <9o.975) 


Pr(> <Te) 


a Sa 


0.48 


0.49 


(0.02,0.84) 


0.07 


a Sa 


1.59 


1.66 


(0.12,2.30) 


0.84 


Ca 


0.57 


0.51 


(0.02, 1.34) 


0.19 




1.74 


1.61 


(0.11,3.84) 


0.82 


e CTe 


0.81 


0.80 


(0.62,1.07) 






1.04 


0.92 


(0.52,2.30) 





5 Discussion 

The major contribution of this paper can be seen as ideological in nature, in that the 
standard method of analysis of variance is treated as a useful procedure to practitioners. 
As a result, although rigorous treatment is given, the discussion has been restricted to 
relatively simple designs. Extending from a one-way balanced ANOVA to many factors 
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^ \ 1 \ r-^ ^ \ 1 1 

01234 01234 

Figure 4: Posterior uncertainty intervals of variance parameters shown graphically 
for both cases. Thick hne segments correspond to 50% uncertainty and thin line 
segments to 95%. Vertical marks denote the posterior median. A point at the end 
of an interval denotes evidence that the variance component is zero and can be 
compared to the corresponding hypothesis test. 

can be considered as a trivial step from here. However, much more work is needed in 
order for the method to be able to be widely accepted. Issues of unbalanced designs, non- 
orthogonal predictors, and generalized linear models are all necessary for the widespread 
usage of any statistical method. Therefore these are all issues that are to be examined in 
greater detail in the future. 
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